Skip to content

Expose heartbeat default_timeout_secs in config.toml#775

Open
pbranchu wants to merge 1 commit intoRightNow-AI:mainfrom
pbranchu:config-heartbeat-timeout
Open

Expose heartbeat default_timeout_secs in config.toml#775
pbranchu wants to merge 1 commit intoRightNow-AI:mainfrom
pbranchu:config-heartbeat-timeout

Conversation

@pbranchu
Copy link

Summary

  • Add a [heartbeat] section to KernelConfig with a default_timeout_secs field (default: 180)
  • Wire start_heartbeat_monitor to read the configured timeout instead of using the hardcoded default
  • Add tests for deserialization and default values
  • Document the new config section in docs/configuration.md

Problem

Reactive agents (hands) that sit idle between infrequent requests get marked as crashed after the hardcoded 180-second inactivity timeout. This causes the first request after an idle period to fail with a health-check error, requiring the kernel to recover the agent before it can serve the request.

Users with hands that are called infrequently (e.g., a code-review hand invoked a few times per day) need to increase this timeout without modifying source code.

Usage

[heartbeat]
default_timeout_secs = 600   # 10 minutes for infrequently-used hands

Per-agent heartbeat_interval_secs in autonomous config continues to override this global default.

Test plan

  • cargo test -p openfang-types -- config::tests::test_heartbeat passes (4 new tests)
  • cargo test -p openfang-kernel -- heartbeat::tests::test_heartbeat_config_custom_timeout passes
  • Verify a kernel booted with [heartbeat] default_timeout_secs = 600 logs the custom value
  • Verify omitting the [heartbeat] section preserves the 180s default

🤖 Generated with Claude Code

Add a [heartbeat] section to KernelConfig so users can tune the
inactivity timeout that determines when agents are marked unresponsive.
Reactive agents (hands) that sit idle between infrequent requests were
getting marked as crashed after the hardcoded 180s default, causing
the first request after idle to fail.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
moritzschmitz-oviva pushed a commit to boxcee/openfang that referenced this pull request Mar 24, 2026
- Vendor native-tls OpenSSL for cross-compilation to aarch64-linux
- Apply PR RightNow-AI#775: expose heartbeat default_timeout_secs in config.toml

Fixes: RightNow-AI#766
@boxcee
Copy link

boxcee commented Mar 24, 2026

I run this PR locally. It also helps with slow local models.

@boxcee
Copy link

boxcee commented Mar 24, 2026

Nice work on exposing the config! I ran into an issue where default_timeout_secs from config.toml was being overridden by the per-agent heartbeat_interval_secs * UNRESPONSIVE_MULTIPLIER (30 * 2 = 60s), even when the global default was set higher (e.g. 300s).

The fix is to treat default_timeout_secs as a floor — in check_agents():

let per_agent = entry_ref.manifest.autonomous.as_ref()
    .map(|a| a.heartbeat_interval_secs * UNRESPONSIVE_MULTIPLIER);
let timeout_secs = per_agent
    .map(|t| t.max(config.default_timeout_secs))
    .unwrap_or(config.default_timeout_secs) as i64;

Without this, any autonomous agent (including all hands) ignores the config value and uses 60s, which is too aggressive for slow local models with thinking/reasoning enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants